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The arrival process of bidders and bids in online auctions is im- 
portant for studying and modeling supply and demand in the online 
marketplace. A popular assumption in the online auction literature is 
that a Poisson bidder arrival process is a reasonable approximation. 
This approximation underlies theoretical derivations, statistical mod- 
els and simulations used in field studies. However, when it comes to 
the bid arrivals, empirical research has shown that the process is far 
from Poisson, with early bidding and last-moment bids taking place. 
An additional feature that has been reported by various authors is an 
apparent self-similarity in the bid arrival process. Despite the wide 
evidence for the changing bidding intensities and the self-similarity, 
there has been no rigorous attempt at developing a model that ad- 
equately approximates bid arrivals and accounts for these features. 
The goal of this paper is to introduce a family of distributions that 
well-approximate the bid time distribution in hard-close auctions. 
We call this the BARISTA process (Bid ARrivals In STAges) be- 
cause of its ability to generate different intensities at different stages. 
We describe the properties of this model, show how to simulate bid 
arrivals from it, and how to use it for estimation and inference. We 
illustrate its power and usefulness by fitting simulated and real data 
from eBay.com. Finally, we show how a Poisson bidder arrival process 
relates to a BARISTA bid arrival process. 

1. Introduction and motivation. Empirical research of online auctions 
has been flourishing in recent years due to the important role that these 
auctions play in the marketplace, and the availability of large amounts of 
high-quality bid data from websites such as eBay, Yahoo!, OnSale and uBid. 
Many of the theoretical results derived for traditional (offline) auctions have 
been shown to fail in the online setting for reasons such as globalism, comput- 
erized bidding and the recording of complex bids, longer auction durations, 
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more flexibility in design choice by the seller and issues of trust. A central 
factor underlying many important results is the number of bidders partic- 
ipating in the auction. Typically, it is assumed that this number is fixed 
[Pinker et al. (2003)] or fixed but unknown [McAfee and McMillan (1997)]. 
In online auctions the number of bidders and bids is not predetermined, and 
it is known to be affected by the auction design and its dynamics. Thus, in 
both the theoretical and empirical domains the number of bidders and bids 
plays an important role. 

We propose a new and flexible model for the bid arrival process. Hav- 
ing a model for bid arrivals has several important implications. First, many 
researchers in the online auction arena use simulated bid arrival data to vali- 
date their results. Bapna et al. (2002), for example, use simulated bid arrival 
data to validate their model on a bidder's willingness to pay. Gwebu et al. 
(2005) design a complex simulation study to analyze bidders' strategies us- 
ing assumptions about the bidder, as well as bid arrival rates. It has also 
been noted that the placement of bids influences the bidder arrival process 
[Beam et al. (1996)]. Hlasny (2006) reviews several econometric procedures 
for testing for the presence of latent shill-bidding (where sellers fraudulently 
bid on their own item) based on the arrival rate of bids. While a clear un- 
derstanding of the process of bidding can have an impact on the theoretical 
literature, it can also be useful in many applications. These range from auto- 
mated electronic negotiation through monitoring auction server performance 
to designing automated bidding agents. For instance, Menasce and Akula 
(2004) study the connection between bid arrivals and auction server per- 
formance. They find that the commonly encountered "last minute bidding" 
creates a huge surge in the workload of auction servers and degrades their 
performance. They then suggest a model to improve a server's performance 
through auction rescheduling using simulated bid arrival data. 

Modeling the bid arrival process rather than the bidder arrivals also 
promises to produce more reliable results, because bid placements are typ- 
ically completely observable from the auction's bid history, whereas bidder 
arrivals are not. eBay, for instance, posts the temporal sequence of all the 
bids placed over the course of the auction. In particular, every time a bid 
is placed its exact time-stamp is posted. In contrast, the time when bidders 
first arrive at an auction is unobservable from the bid history. Bidders can 
browse an auction without placing a bid, thereby not leaving a trace or re- 
vealing their interest in that auction. That is, they can look at a particular 
auction, inform themselves about current bid- and competition-levels in that 
auction, and make decisions about their bidding strategies. All this activity 
can take place without leaving an observable trace in the bid history that 
the auction site makes public. In fact, it is likely that bidders first browse an 
auction and only later place their bid. The gap between the bidder arrival 
time and bid placement also means that the bidder arrival is not identical to 
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the bid arrival, and can therefore not be inferred directly from the observed 
bid times. Another issue is that most online auctions allow bid-revision, 
and therefore many bidders place multiple bids. This further adds to the 
obscurity of defining the bidder arrival-departure process. Our approach is 
therefore to model the bid arrival process based on empirical evidence. 

The current literature, based on publicly-available bid data, reports strong 
evidence of two major features of the bid arrival process in online auctions: 
(1) a nonhomogenous intensity that possesses two or three distinct stages, 
and (2) a self-similarity effect in the distribution of bid arrival times. We 
describe these in Section 2. However, aside from noting these features, no 
model has been suggested for approximating the bid arrival process that 
addresses these two features. In light of the absence of such a model, we 
introduce the BARISTA process, a model that well-approximates bid arrivals 
in online auctions. Section 3 introduces the model and its properties, and 
describes two special cases. Section 4 describes a method for simulating data 
from this process and several methods for estimating model parameters. In 
Section 5 we use simulated data and a diverse set of real bid data from eBay 
to illustrate the estimation and model fit. In addition to the various uses 
of the bid arrival model, one might be able to infer bidder strategies from 
the aggregate bid arrival process. In Section 6 we tie the bid and bidder 
arrival processes, proposing several bidding strategies that would lead to 
BARISTA-type bid arrivals. In Section 7 we conclude and suggest future 
enhancements. 

2. Features of bid arrivals. We start by describing two prominent fea- 
tures of bid arrivals that have been reported in the literature, and follow 
with an illustration using bid data from eBay. 

2.1. Multi-stage arrival intensities. Time- limited tasks are omnipresent 
in the offline world: voting for a new president, purchasing tickets for a 
popular movie or sporting event, filing one's federal taxes, etc. In many of 
these cases arrivals are especially intense as the deadline approaches. For 
instance, during the 2001 political elections in Italy, more than 20 million 
voters cast their ballots between 13:00-22:00 [Bruschi et al. (2002)], when 
ballots were scheduled to close at 22:00. Similarly, a high proportion of 
U.S. tax returns are filed near the 15 April deadline. For instance, about 
one-third of all returns are not filed until the last two weeks of tax season 

(www.heraldstandard. com/site/news . cfm?newsid=14359378&BRD=2280&PAG=461&dept_id=480247^ 

According to Ariely et al. (2005), deadline effects have been noted in stud- 
ies of bargaining, where agreements are reached in the final moments be- 
fore the deadline [Roth et al. (1998)]. Such effects have been shown among 
animals, which respond more vigorously toward the expected end of a rein- 
forcement schedule, and in human task completion where individuals be- 
come increasingly impatient toward the task's end. Furthermore, people 
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use different strategies when games are framed as getting close to the end 
[even when these are arbitrary break points; Croson (1996)]. In addition to 
the deadline effect, there is an effect of earliness where the strategic use of 
time moves transactions earlier than later, for example, in the labor market 
[Roth and Xing (1994); Avery et al. (2001)]. 

Such deadline and earliness effects have also been observed in the on- 
line environment. Several researchers have noted deadline effects in internet 
auctions [Bajari and Hortacsu (2000); Borle et al. (2006); Ku et al. (2004); 
Roth and Ockenfels (2000); Wilcox (2000)]. In many of these studies it was 
observed that a nonnegligible percent of bids arrive at the very last minute 
of the auction. This phenomenon, called "bid sniping," has received much 
attention, and numerous explanations have been suggested to explain its 
existence. Empirical studies of online auctions have also reported an un- 
usual amount of bidding activity at the auction start followed by a longer 
period of little or no activity [Borle et al. (2006); Jank and Shmueli (2007)]. 
Bapna et al. (2003) refer to bidders who place a single early bid as "eval- 
uators." Finally, "bid shilling," a fraudulent act where the seller places 
dummy bids to drive up the price, is associated with early and high bidding 
[Kauffman and Wood (2000)]. The existence of these bid-timing phenomena 
are important factors in determining outcomes at the auction level, as well 
as at the market level. They have therefore received much attention from 
the research community. 

2.2. Self- similarity (and its breakdown). While both the offline and on- 
line environments share the deadline and earliness effects, the online environ- 
ment appears to possess the additional property of self- similarity in the bid 
arrival process [this property was also found in the offline process of bargain- 
ing agreements, as described in Roth and Ockenfels (2000)]. Self-similarity 
refers to the "striking regularity" of shape that can be found among the dis- 
tribution of bid arrivals over the intervals [t,T], as t approaches the auction 
deadline T. Self-similarity is central in applications such as web, network and 
ethernet traffic. Huberman and Adamic (1999) found that the number of 
visitors to websites follows a universal power law. Liebovitch and Schwartz 
(2003) reported that the arrival process of email viruses is self-similar. How- 
ever, this has also been reported in other online environments. For instance, 
Aurell and Hemmingsson (1997) showed that times between bids in the in- 
terbank foreign exchange market follow a power law distribution. 

Several authors reported results that indicate the presence of self-similarity 
in the bidding frequency in online auctions. Roth and Ockenfels (2000) found 
that the arrival of last bids by bidders during an online auction is closely 
related to a self-similar process. They approximated the CDF of bid arrivals 
in "reverse time" (i.e., the CDF of the elapsed times between the bid ar- 
rivals and the auction deadline) by the power functional form Fr(t) = (t/T) a 
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(a > 0), over the interval [0,T], and estimated a from the data using OLS. 
This approximates the distribution of bids over intervals that range from 
the last 12 hours to the last 10 minutes, but accounts for neither the final 
minutes of the auction nor the auction start and middle. Yang et al. (2003) 
found that the number of bids and the number of bidders in auctions on eBay 
and on its Korean partner (auction. co.kr) follow a power law distribution. 
This was found for auctions across multiple categories. The importance of 
this finding, which is closely related to the self-similarity property, is that the 
more bidding one observes up to a fixed time point, the higher the likelihood 
of seeing another bid before the next time point. According to Yang et al. 
(2003), such power-law behaviors imply that the online auction system is 
driven by self-organized processes, involving all bidders who participate in 
a given auction activity. 

The implications of bid arrivals following a self-similar process instead of 
an ordinary Poisson model are significant: The levels of activity throughout 
an auction with self-similar bid arrivals would increase at a much faster rate 
than expected under a Poisson model. It would be especially meaningful 
toward the end of the auction, which has a large impact on the bid amount 
process and the final price. The self-similar property suggests that the rate 
of incoming bids increases steadily as the auction approaches its end. In- 
deed, empirical investigations have found that many bidders wait until the 
very last possible moment to submit their final bid. By doing so, they hope 
to increase their chance of winning the auction since the probability that 
another competitor successfully places an even higher bid before closing is 
diminishing. This common bidding strategy of "bid sniping" (or "last minute 
bidding" ) would suggest a steadily increasing flow of bid arrivals toward the 
auction end. However, empirical evidence from online auction data indicates 
that bid times over the last minute or so of hard-close auctions tend to fol- 
low a uniform distribution [Roth et al. (1998)]. This has not been found in 
soft-close, or "going-going-gone" auctions, such as those on Amazon, Yahoo! 
or uBid.com, where the auction continues several minutes after the last bid 
is placed. 

Thus, in addition to the evidence for self-similarity in online auctions, 
there is also evidence of its breakdown during the very last moments of a 
hard-close auction. Roth and Ockenfels (2000) note that the empirical CDF 
plots for intervals that range between the last 12 hours of the auction and 
the last 1 minute all look very similar except for the last 1-minute plot. 
Being able to model this breakdown is essential, since the last moments of 
the auction (when sniping takes place) are known to be crucial to the final 
price. In the absence of such a model, we introduce a bid arrival process 
that describes the frequency throughout the entire auction. Rather than 
focusing on the last several hours and excluding the last moments, our model 
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accommodates the changes in bidding intensity from the very start to the 
very end of a hard-close auction. 

To illustrate the self-similarity in the bid arrival process in online auc- 
tions, we collected data on 189 7-day auctions (with a total of 3651 bid 
times) on eBay.com for new Palm M515 Personal Digital Assistants. Fig- 
ure 1 displays the empirical CDFs for the 3651 bid arrivals for the purposes 
of examining the self-similarity property. The CDF is plotted at several res- 
olutions, "zooming-in" from the entire auction duration (of 7 days) to the 
last day, last 12-hours, 6-hours, 3-hours, 5-minutes, 2-minutes and the very 
last minute. We see that (1) the last day curve (thick black) is different 
from the other curves in that it starts out concave, (2) the last day through 
last 3-hour curves (red) are all very similar to each other, and (3) the last 
minutes curves (grey) gradually approach the 1-minute curve which is nearly 
uniform. These visual similarities are confirmed by the results of two-sample 
Kolmogorov-Smirnoff tests where we compared all the pairs of distributions 
and found similarities only among the curves within each of the three groups. 

This replicates the results in Roth and Ockenfels (2000) where self-similarity 
was observed in the bid time distributions of the last 12-hour, 6- hour, 3- 
hour, 1-hour, 30-minute and 10-minute periods of the auction, and where 
this self-similarity breaks down in the last minute of the auction to become 
a uniform distribution. However, we examine a few additional time reso- 
lutions which give further insight: First, by looking at the last 5-minutes 
and last 2-minutes bid distributions, we see that the self-similarity gradu- 
ally transitions into the 1-minute uniform distribution. Second, our inspec- 
tion of the entire auction duration [which was unavailable in the study by 
Roth and Ockenfels (2000)] reveals an additional early-bidding stage. Self- 
similarity, it appears, is not prevalent throughout the entire auction dura- 
tion! Such a phenomenon can occur if the probability of a bid not getting 
registered on the auction site is positive at the last moments of the auction, 
and increases as the auction comes to a close. There are various factors that 
may cause a bid to not get registered. One possible reason is the time it 
takes to manually place a bid [Roth and Ockenfels (2000) found that most 
last minute bidders tend to place their bids manually rather than through 
available sniping software agents]. Other reasons are hardware difficulties, 
Internet congestion, unexpected latency and server problems on eBay (see, 
e.g., www.auctionsniper.com). Clearly, the closer to the end the auction 
gets, the higher the likelihood that a bid will not get registered successfully. 
This increasing likelihood of an unsuccessful bid counteracts the increasing 
flow of last minute bids. The result is a uniform bid arrival process that 
"contaminates" the self-similarity, of the arrivals until that point. 

In the next section we introduce a flexible nonhomogeneous Poisson pro- 
cess (NHPP) that captures the empirical phenomenon described above. In 
addition to the self-similarity, it also accounts for the two observed phenom- 
ena of "early bidding" and "last minute bidding" (sniping). 
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Fig. 1. Empirical CDFs of number of bids in 189 Palm M515 auctions overlaid. 



3. The BARISTA: A three-stage nonhomogeneous Poisson process. We 

introduce a process that captures the two main features of arrivals in online 
auctions: the three stages and the self-similarity (with its breakdown). We 
call this the BARISTA (Bid ARrivals In STAges) process, because it gen- 
erates different intensities of activity [we also call the stages the "espresso 
stage" (short and intense), "macchiato stage" (stained) and "ristretto stage" 
(extra intense), and hence the BARISTA]: 

Stage 1. The auction start, characterized by an early burst of activity, 
Stage 2. The mid-auction bid arrivals, characterized by increasing bid inten- 
sity and self-similarity that is gradually contaminated as the 3rd stage is 
approached, and 

Stage 3. The last moments of the auction, characterized by very intense 
activity dampened possibly by bids that are not successfully transmitted. 



3.1. Model formulation. A nonhomogeneous Poisson process differs from 
an ordinary Poisson process in that its intensity is not a constant, but rather 
a function of time. We introduce a particular intensity function that captures 
the three-stage dynamics described above. Suppose bids arrive during [0,T] 
in accordance with a nonhomogeneous Poisson process N(s), < s < T, with 
intensity function 



(1) A( s ) 



c 1 



T 



1 



T 



CM— 1 



c 1 



o 2 — 1 



Q3-1 



for < s < d\ , 

for d\ < s < T — c?2 ) 

for T — d 2 < s <T, 
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where c> 0, ay > for j = 1, 2, 3, T is a known constant, and < d\ < T — 
d,2 < T. Note that this intensity function is continuous, so there are no jumps 
at times d\ and T — d 2 . The random variable N(s) which counts the number 
of arrivals until time s follows a Poisson distribution with mean 



(2) m(s) 



k( 1 



1 



K\l 

+ 



di\ a 



Tc/d 2 \ Q2 " Q3 
a 3 \T 



for < s < di 
for d\ < s < T — d 2 , 

T J V T 

for T — (i 2 < s < T, 



where if = ^(1 - 4*-)° 2 ~ ai . 

Given that N(T) = n, the collection of arrival times are equivalent to the 
order statistics of a random sample of size n from the distribution having 
distribution F(s) = m(s)/m(T): 



«2— 



(3) F(s) = { 



CT 

a\a.2 



di\ a2 



for < s < d\ , 



(ai - a 2 ) (l - j^j 
+ a 2 l 1 - — ) 



«2" 



for d\ < s < T — d 2 , 

Note that for the interval d\ < s <T — d 2 we can write the CDF as 

CT 



(4) 
where 



F(s) = F(d 1 ) + 



a 2 



1 



di\° 
T 



1 



C = c/m(T) 



aia 2 as/T 



(1 - di/T) a 2a 3 (ai - a 2 ) + a 3 a 2 (l - di/T) Q 2-«i + (d 2 /T) Q 2ai(a 2 - a 3 ) 
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The density function corresponding to this process is given by 



(5) /(-) 



C 1 



C[ : 
C 



ai—\ 



an— 1 



T 

d 2 \ a2 ~ a ' A 



1 



s \ 
T 



03-1 



for < s < d\ , 

for d\ < s <T — d,2, 

for T - d 2 <s<T. 



We expect 03 to be close to 1 (uniform arrival of bids at the end of the 
auction) and ct\ > 1 to represent the early surge in bidding. 



3.2. Properties of the BARISTA process. The process described by (1)- 
(4) has two properties that lead to a wide family of processes, and that can 
be useful in practice. We describe each property and its implications below. 



3.2.1. An additive property. If N^, 1 < k < m, are independent BARISTA 
processes having c parameters ci,...,c m and common (01,02,03), (d\,d,2) 
and T parameters, then the aggregated process = J2i<k<m is a BARISTA 
with parameters (01,02,03), (di,^), T, and c = J2i<k<m c k- 

This means that the bid arrival times from several auctions may be aggre- 
gated and treated as though they were generated by a single auction, pro- 
vided that each original auction can be regarded as producing a BARISTA 
process with the same (or nearly the same) parameters. The advantage of 
aggregation is more accurate parameter estimation. 



3.2.2. A regenerative property. An observer who counts only the bid ar- 
rivals occurring after time (3T, some < (3 < 1, sees the process 

N p {s):=N(s)-N(f3T), (3T<s<T. 

Np is an NHPP with intensity function A/3 = A, restricted to the interval 
[(3T,T). 

Taking (5T as the new zero, and recording time on a new (faster) clock 
where one new minute (a shminute) = (1 — (3) minutes on the original clock, 
we can write Xp as 



<* 1 



d 



hi. 



a.2—a\ 



012 — 1 



1 



Oil —1 



c )1 



2,13 



02—0^3 



1 - — 



03-1 



< s < di t p, 

di,/3 <s<Tf3- d 2 ,p, 

Tp - d 2t p <s< Tp, 
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where eg = c(l — /?) a2_1 , d\$ = max(<ii — /?T,0), di,p = vna(d2,Tp), and 
Tp = (1 — /3)T. Thus, A,g has the same form as our original A with A = Ao in 
the new notation. 

This regenerative property means that we can use the BARISTA model 
to approximate bid arrivals in an ongoing auction, not only in a completed 
auction. One application where this is useful is in real-time forecasting of 
future bid times, such as for the purpose of optimizing server performance. 

3.3. Special cases. In empirical studies, d\ appears to be small (1-2 days) 
and c?2 very small (a few minutes) compared to T (several days). Thus, most 
of the BARISTA process is realized in the second stage, during which the 
process can be regarded as having contaminated self- similarity. The con- 
tamination is caused by the bid arrivals in the third stage, and increases as 
s — >■ T - d 2 ■ 

When d\=da = 0, the BARISTA process reduces to a single-stage process 
(NHPPi) with an intensity function A(s) = c(l — ^) Q_1 and associated CDF 
function F(s) = 1 - (1 - j*) a ,0 <s<T. For (0, *) £ [0, 1] x (0, T], we have 

■y F(T £0) 

l-F(T-t) = d ° ( inde P endent of *)> 

and thus, we have a pure self-similar process. The joint MLE of (a, c) is 
obtainable in this case (see Appendix A): 

a = -JV(T)^]n(l-£) 

Since X ~ F =^ - ln(l - f ) ~ exp(raie = a), and lim^oo Pi(N(T) -> oo) 
1, a conditioning argument on N(T) yields an asymptotic result: 



N(T)a 



'N(T)(^-1)^N(0,1) asc^oo, 

where N(0, 1) indicates a standard normal distribution. 

When d\ = 0, the BARISTA process reduces to a two-stage process (NHPP2), 
with a single changepoint at T — (f 2 ■ This process is useful for modeling bid 
arrivals in auctions that lack the initial surge of early bidding. For further 
technical details on these special cases, see Shmueli et al. (2004). 

4. Fitting the BARISTA to data. Simulated bid arrivals are useful in 
field experiments, in evaluation of model fit, and for quantifying sampling 
error. The method is simple to program and computationally efficient. 
Fitting the BARISTA process to data requires estimating the two 
changepoints and three a parameters. We introduce three estimation meth- 
ods that range in their computational intensiveness and accuracy (Matlab 
code for the simulation and estimation procedures is available at 
http : //www . smith . umd . edu/ceme/ statistics/ code . html). 
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4.1. Process simulation. To simulate n observations from the BARISTA 
process on the interval [0, T], we use the inversion method and apply the 
inverse CDF to a simulated random sample of U(0, 1) variates. In particular, 
the inverse CDF can be written as 

-ctV-t) } • 

for < s < d\ , 
di \ " 2 a? 1 1/02 

013—0.2 ~| 1/03 



(6) F'\s) 



T-TU 1 

IV T J (JT" 

for d\ < s <T — d 2 , 

f a-? /d-A" 3- " 2 ! 1/03 

t-t|^1(i- s )^J I , forr-d 2 < s <r. 

The algorithm for generating n arrivals (x±, . . . ,x n ) is then: 

(1) Generate n uniform variates u±, . . . , u n . 

(2) For k = 1, . . . , n, set 

T-Tll 



(7) x fc 



CT 



1-* 



r-T|^(F 3 (di)-^) + (i 



if u k <F(d x ), 

di y 2 y/a 2 

if F(di)<u fe <F(T-d 2 ), 



T-T|— « fc l — I I , if u k >F(T-d 2 ). 

Note that we fix the number of bid arrivals (n) rather than a randomly 
generated number, since the estimators are of the same form in both cases 
(see Appendix A). In order to generate a random number of bids under the 
BARISTA model, one would generate a variate from a Poisson distribution 
with mean m(T) [see equation (2)]. 

4.2. Parameter estimation. We describe three estimation methods each 
having a different tradeoff between computational intensity and accuracy, 
and with varying amounts of required user input. 

4.2.1. Quick and crude (CDF-based) estimation. The estimation of the 
a parameters depends on the changepoints d\, T — d 2 and vice- versa. As a 
crude start, we choose three intervals of the form [T — t,T — s] that we are 
confident lie in the first, second or third stages, and use those for estimating 
the a parameters. We then use the a estimates to obtain estimates for the 
changepoints. 

In both cases the estimates are based on writing the parameters as a 
function of the CDF, and then substituting the empirical CDF to obtain 
estimates. 
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Estimation of a parameters. Prom (3) it can be seen that in each interval 
the CDF of the B ARISTA process is in the form F(t) = 0j - 0,(1 - 
(j = 1,2,3), and therefore the same approximation works on each of the 
three intervals [0, di], [d\,T — d^\ and [T — d<2,T]. After choosing intervals 
[T — t,T — s] that we are confident lie in stage j (the first, second or third 
stage), we have 

F(T -t)- F(T - >/5t) 

(8) 



F(T - Vsi) - F(T - s) 

_ 6j[l - (1 - (T - t)/T)^} - dj[l - (1 - (T - y/Tt)/T)*i 



6j [l - (1 - (T - \fsi)/T) a i] - e [l — (1 — (T — a )/T)«i] 
(ts) a ^ 2 -t a J (s a i/ 2 - t a i/ 2 )t a i/ 2 (t\ a i /2 



The relevant a is given by 

_ o ln[F(T - t) - F(T - yg)j - ln[F(T - yg) - F(T - s)] 
i9j aj "^ lnt-lns 

We then estimate «j by substituting F with the empirical CDF F e = N(t)/N(T) 
in the approximation. 

For CK3, we can use the exact relation 

(10) a 3 = 

[ } 3 H(T-t 3 )/(T-t> 3 )Y 

where R(t) = 1 — F(t) and ts,t' 3 are within [T — d2,T]. To estimate 03, we 
choose reasonable values of £3,^ and use the empirical survival function 
R e = 1 — F e . 

Obtaining standard errors for these estimators can be done by bootstrap- 
ping [see Efron and Tibshirani (1993) for details], due to the low computa- 
tional effort involved in this estimation method. 

To assess this method, we simulated 5000 random observations from 
the BARISTA process on the interval [0, 7] with parameters a± = 3, 0.2 = 
0.4, «3 = 1 and the changepoints d\ = 2.5 (defining the first 2.5 days as the 
first stage) and c?2 = 5/10080 (defining the last 5 minutes as the third stage). 
The intensity function for these data is shown in Figure 2, and parameter 
estimates with their standard errors are given in Table 1. 

To study the robustness of the estimators to the choice of t and s, we 
computed the quick and crude estimate for a\ on a range of intervals of the 
form [0.001, ti], where 0.5 < t\ < 5. Note that this interval includes values 
that are outside the range [0, d\ =2.5]. The left panel in Figure 3 illustrates 
the estimates obtained for these intervals. For values of t± between 1.5-3.5 
days, the estimate for a\ is relatively stable and close to 3. Similarly, the 
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right panel in Figure 3 describes the estimates of 0:3, using (10), as a function 
of the choice of £3 with t' 3 = 7 — 1/10080. The estimate is relatively stable 
and close to 1. 

For estimating a 2 , an interval such as [3,6.9] is reasonable. Figure 4 shows 
the estimate as a function of the interval choice. It is clear that the estimate 
is relatively insensitive to the exact interval choice, as long as it is reason- 
able. 

Estimation of d\ and d 2 . Using functions of the CDF, we obtain expressions 
for d\ and d 2 . Let ti,t 2 ,t 2 and £3 be such that < t% < d±, d\ < t 2 <t 2 < 

T - d 2 , and T - d 2 < t 3 < T. For di, we use the ratio and for 

d 2 , we use the ratio -^z^r^y^ . These lead to the following expressions: 



11) d,=T-T 



fai F(tx) (l-t' 2 /T) ao - -(l-t 2 /T) a ^Y /{a2 ~ ai) 
W 2 ' F{t 2 )-F{t> 2 ) i_(i_ tl /r)«i J 




Fig. 2. Intensity function A(s) for simulated data, where ai — 3, 0:2 = 0.4, 03 = 1, 
di — 2.5, di = 7 — 5/10080, and c= 1. TVofe t/ie different time scale for the last 5 min- 
utes (right panel). 



I 1.02 

1 



5 6 7 8 

T-t (in minutes before end) 



Fig. 3. Quick estimates of 0:1,02 and 03 as a function of the input intervals, for simu- 
lated BARISTA process data. 
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{ 1 2 \a 2 F(t 2 )-F(t> 2 ) (1-h/T)** J 

We can therefore estimate d\ and d 2 by selecting "safe" values for ti,t' 2 ,t 2 
and £3 (which are confidently within the relevant interval) and using the 
empirical CDF at those points. 

Using this method, we estimated d\ and d 2 for the simulated data. We 
used the true values of the a parameters and the "safe" values t\ = l,t' 2 = 
3, t 2 = 6, and t% = 7 — 2/10080. The estimates and their (bootstrap) standard 
errors are reported in Table 1. Figure 5 shows the robustness of the estimates 
to the choice of the "safe" values. It can be seen that d\ estimates are stable 
between 2.4-2.6 even if we choose t± slightly outside of the first interval 
[0,2.5]. d 2 estimates are between 3-5.5 minutes even when is dislocated 
by a few minutes into the second interval. 

Estimation of c. The estimate of the parameter c is based on the estimate 
9 of the other parameters 9 = (ai,a 2 ,a3,di,d 2 ), and the observed number 
N(T) of bids placed on [0, T]. Define g(6; s) = X(s)/c, < s < T, where A is 
the function in (1). We have 

N(T)mE[N(T)]=c [ g(6;s)ds^c[ g(6;s)ds. 



io Jo 
Solving for c, we obtain the estimate 

N(T) 

J T g(9;s)ds 




Fig. 4. Quick and crude estimate of ct2 as a function of [£2^2] choice. &2 is between 
0.4-0.55 in the entire range of intervals. The more extreme intervals (t' 2 < 3.4 or t 2 > 6.8) 
yield &2 = 0.4. 
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Fig. 5. Graphs of di vs. t\ (left) and di vs. initial values ofT — % (right) for simulated 
data. The estimate for d\ is stable at « 2.5. d 2 using the last 2-5 minute interval is in the 
range of 4~5 minutes. 



If 6 is an MLE of 6, then c is an MLE of c. 

4.2.2. Maximum likelihood estimation. Conditional on N(T) = n (see 
Appendix A for unconditional estimation), the BARISTA likelihood func- 
tion is given by 

jC(x±, . . . ,Xn\ai,a2,a3,di,d,2) 
(13) = nlnC + n\(ct2 — a\)hx(l + 7x3(02 — 03) In — 



TJ v T 
+ (at - l)5i + (a 2 - 1)5 2 + (ag - 1)5 3 , 
where n\ is the number of arrivals before time d±, 713 is the number of 
arrivals after T - d 2 , Si = Ei ::Ei <di M 1 -%),S 2 = Ei:di<x<<T-da ln ( 1 ~ t) 
and 53 = Y,i:x t >T-d 2 ln(l - 

In order to estimate 01,02,03 for given values of di,d 2 , the following 
three equations must be solved (equating the first derivatives in 01,02,03 
to zero): 

(14) Sl = n 1 ln(l-^ nd ° 



T J Cdai 
(15) 62 = — ni In ( 1 — — i-^ — ri3 In ^ 



TJ * T Cdct2 



fiR\ a 1 d 2 n dC 

(16) S3 = n3ln T-Cdc7 3 > 



where 



1-— I ll + oilnM- — 



(17) ^"-a-^-T 
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dC _ C 2 T f ( 1 _ d i\ a2 
dot2 aia^a^ IV T J 



(18) 



C 2 T 



x 



T 



— OL2 



a 2 In (l - y) f«2 - on + 0L2 (l - j^j 



Oil 



Oil 



dl 

T 



di 
T 



tt-j 



In 1 



di 
T 



do 



a 3 + a 2 In— (a 2 - 03) 



di 
T 



+ Oi2 



d 2 ^ 


CK2 

In 


d 2 


T , 


T 





di" 
T . 


. 02 



In 1 



di 
T 



+ 



of 
a 3 



a 2 



In- 



T 



(19) 



dC 



C 2 T fd 2 \ a2 
0% 



Since the equations are nonlinear in the parameters, an iterative gradient 
method can be used (the second derivatives are given in Appendix B). This 
can be solved using an iterative gradient-based method such as Newton 
Raphson or the Broyden-Fletcher-Goldfarb-Powell (BFGP) method, which 
is a more stable quasi-Newton method that does not require the compu- 
tation and inversion of the Hessian matrix [see, e.g., Dennis and Schnabel 
(1983)]. If the changepoints di and cfo are unknown and we want to estimate 
them from the data, then search algorithms such as genetic algorithms can 
be more efficient, more stable and more easily programmable for finding a 
solution. Otherwise, the likelihood needs to be computed for a grid of di x d 2 
values. In addition, empirical evidence suggests that gradient methods tend 
to be unstable for solving this maximization problem. In short, an exhaus- 
tive search over a reasonable grid of the parameter space or a stochastic 
search algorithm are good practical solutions. A good starting value would 
be the estimate obtained from the quick and crude method. 

Genetic algorithm search. An alternative to an exhaustive search is the 
genetic algorithm (GA). The genetic algorithm belongs to a general class 
of stochastic, global optimization procedures that imitate the evolutionary 
process of nature. The basic building blocks of GA are crossover, mutation 
and selection — similar to their biological counterparts found in the evolu- 
tion of genes. GA is an iterative process and each iteration is called a new 
generation. Starting from a parent population, two parents create offspring 
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GA Generations for Simulated Data 
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o 



! 
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Fig. 6. 500 generations of the GA for the simulated data. 



via crossover and mutation. The crossover operator imitates the mixing of 
genetic information during sexual reproduction. The mutation operator im- 
itates the occasional changes of genetic information due to external influ- 
ences. An offspring's fitness is evaluated relative to an objective function. 
The offspring with the highest fitness is then selected for further reproduc- 
tion. Although GAs' operations appear heuristic, Holland (1975) provides 
theoretical arguments for convergence to a high quality optimum. 

We use a GA to estimate the parameters of the BARISTA process as fol- 
lows. After creating the parent population of size S = 100, we select the top 
10% of parents with the highest fitness and perform crossover and mutation 
on a randomly chosen pair, thereby creating a pair of new offspring. We re- 
peat this 50 times to obtain an offspring population of the same size S as the 
parent population. After creating a set of suitable offspring, the next step is 
to evaluate an offspring's fitness. One approach is to evaluate an offspring's 
fitness according to its likelihood value. Let denote an offspring and let 
L(8) = L(xi, . . . , x n \9) denote the corresponding likelihood value. For two 
offspring Q\ and # 2 , 0i has higher fitness if C{9\) > £(#2)- 

We ran a GA on the simulated data, restricting the range of possible 
solutions to the hypercube (ai, 021 03, cfe) S [1,15] x [0.1,1] x [0.5,15] x 
[1,5] x [0,0.01], and running it for a total of 500 generations. Figure 6 shows 
a graph of the fitness-history over the 500 generations. We see that after 
generation 300, there are barely any further improvements. Our parameter 
estimates are taken from the last generation. This yielded the estimates and 
standard errors given in Table 1. All of these estimates are in line with the 
quick and crude estimates, and are very close to the values used to generate 
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the data. The combined numerical maximization and grid search procedure 
did not converge. 

4.2.3. Model selection. Although we posit that a three-stage model is, at 
least in general, most suitable for describing the bid arrival process in online 
auctions, it is possible to extend the estimation process to include model 
selection. To allow for a more flexible family of distributions, we consider the 
family of one-stage (NHPPi), two-stage (NHPP 2 ) and 3-stage (BARISTA) 
models. Since the first two are nested within the BARISTA model, we can 
choose the best model using likelihood-ratios. To compare a 3-stage with a 

2- stage model, for instance, we use the statistic 

(20) - 2{C{NHPP 2 ) - C(BARISTA)}, 

where £{i) is the log-likelihood for model i. Under the null hypothesis that 
the models are equivalent in their ability to fit the data (i.e., the NHPP2 
is sufficient), the statistic follows a x 2 (p) distribution with p = 5 — 3 = 2 
degrees of freedom (the difference in the number of parameters of the two 
models). If the p- value is sufficiently small, then it is reasonable to choose the 

3- stage model, whereas a large p- value would indicate the use of the 2-stage 
model. A similar statistic can be designed to test the difference between the 
1-st age and 2-stage models, which would also follow a \ 2 distribution, again 
with p = 3 — 1 = 2 degrees of freedom. 

This test statistic can be used in conjunction with any of the estimation 
methods that we described. The most comprehensive and computationally 
intensive option is to find the "best" 1-stage, "best" 2-stage and "best" 3- 
stage models (in the sense of the highest likelihood values), and compare 
them using the likelihood-ratio test. A more practical alternative is to com- 
bine the model selection with a stochastic search algorithm. In particular, we 
incorporated model selection into the genetic algorithm as follows. Starting 

Table 1 

True and estimated values (with standard errors) for the BARISTA model parameters, by 

method 

d 2 





&i 


a 2 


&3 


di 


(minutes) 


Simulated 












(true) 












values 


3 


0.4 


1 


2.5 


5 


CDF-based 












Q&C 


2.85 (0.06) 


0.443 (0.001) 


0.954 (0.0132) 


2.5 (0.0036) 


4.7 (0.13) 


Genetic 












algorithm 


2.88 (0.007) 


0.387 (0.005) 


0.997 (0.009) 


2.63 (0.0044) 


4.6 (0.42) 
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Table 2 

Information for 9 datasets on three different types of 
items and three different auction durations 







7-day 


5-day 


3-day 


Xbox 


#bids 


1861 


393 


557 




^auctions 


93 


21 


35 


Palm 


#bids 


3832 


869 


1216 




^auctions 


194 


54 


35 


Cartier 


#bids 


1348 


355 


250 




^auctions 


97 


21 


18 



with the simplest model, NHPPi, we apply the GA to obtain the parameter 
estimates and the associated log-likelihood value, C\. NHPPi is the most 
parsimonious model and we only move to a more complex model if the data 
justify that choice. We hence continue by fitting an NHPP2 and obtaining 
the log-likelihood value £2- We compute the likelihood-ratio statistic (20) 
and the associated p- value. If the statistic is significant (p- value < 0.05), 
then we discard the current model (NHPPi), move to the better model 
(NHPP2), and repeat the process by comparing that model with the next 
model (BARISTA). Alternatively, if the likelihood-ratio statistic is insignif- 
icant (p- value > 0.05), we stop and retain the current model as the best 
model. 

5. Empirical results. 

5.1. Data. We collected data from eBay.com on closed auctions for three 
types of products: Palm M515 personal digital assistants, Microsoft Xbox 
games and Cartier premium wristwatches. The data include auctions of three 
different durations: 3-day, 5-day and 7-day auctions. Relevant statistics are 
given in Table 2. 

5.2. Estimation. We describe the estimation process only for the 7-day 
Palm bid arrival times. The same approach was used to estimate parameters 
for all other datasets, and we report the estimate for the entire dataset in 
the end. 

5.2.1. Initial quick and crude estimation. Based on previous empirical 
results, we chose the first day for estimating ct±, that is, we believe that bids 
placed during the first day are contained within the first "early bidding" 
stage. Looking at the estimate as a function of the interval chosen (Figure 
7, left panel), we see that the estimate is between 4-5 if we use the first 
1-2 days. It is interesting to note that after the first two days, the estimate 
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decreases progressively reaching 6c\ = 2.5 on the interval [0.01,3], indicating 
that the changepoint d\ is around 2. 

The parameter 03 was estimated using (10) with t' 3 = 7 — 0.1/10080 and 
a range of values for £3. From these, appears to be approximately 1. It 
can be seen in the right panel of Figure 7 that this estimate is relatively 
stable within the last 10 minutes. Also, notice that selecting £3 too close 
to t' 3 results in unreliable estimates (due to a small number of observations 
between the two values). 

Finally, we chose the interval [3,6.9] for estimating 02- This yielded the 
estimate 02 = 0.36. Figure 8 shows the estimate as a function of the interval 
choice. Note that the estimate is stable between 0.2-0.4 for the different 
intervals chosen. It is more sensitive to the choice of t2, the upper bound 
of the interval, and thus an overly conservative interval could yield large 
inaccuracies. 

Using these estimates {6t\ = 4.3, 0,2 = 0.36, 03 = 1), we estimated d\ and 
d.2- Figure 9 shows graphs of the estimates as a function of the intervals 
selected. The estimate for d\ (left panel) appears to be stable at approxi- 
mately d\ = 1.75. The estimate for g?2 (right panel) appears to be around 
2 minutes. From the increasing values obtained for T — t-j > 3 minutes, we 
also learn that cfo < 3. 

5.3. Further refinement: ML and GA. Table 3 displays the above esti- 
mates and compares them to the two other estimation methods: An exhaus- 
tive search over a reasonable range of the parameter space (around the quick 
and crude estimates) and the much quicker genetic algorithm. We restricted 




l t 3 {ln minifies) 



Fig. 7. Quick and crude estimates of ot\ as a function of ti (with t[ = 0.001 ) (left) and 
of Q3 as a function of t$ (with t' z = 0.5/10080 ) (right). &i is stable around 5 for ti in the 
range 0.75-1.75 days. A shorter interval does not contain enough data. A longer interval 
leads to a drop in the estimate, indicating that d\ < 2. 0:3 is around 1.1 when £3 is within 
the last 2~4 minutes. 
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FlG. 8. Quick and crude estimate of oli as a junction 0/^2,^2]- Shorter, "safer" intervals 
are at the lower right. Longer intervals, containing more data, are at the upper left. ri?2 is 
between 0.2-0.4 f or a ^ intervals. For £2 > 6.9, the estimate is approximately 0.35. 

the range of possible solutions for the genetic algorithm to the hypercube 
(ati, 012,013, di,d 2 ) G [0, 10] x [0, 1] x [0, 5] x [0, 5] x [0, 1000 min]. It can be seen 
that all methods yielded estimates in the same vicinity. 

We also performed model selection to see whether a 2-stage or 1-stage 
model would sufficiently fit the data. The low p-values for comparing the 
3-stage model with these models showed that indeed a 3-stage model is 
preferable and more accurately approximates the data. For a detailed fitting 
of the data to an NHPPi (which includes the Poisson as a special case) and 
an NHPP 2 , see Shmueli et al. (2004). 




Q 0.5 1 1.5 2 2.5 3 3.5 4"Q 1 2 3 45B7&9 1G 

L T-tg {in minutes) 



Fig. 9. Plots of d\ vs. t\ (left) and di vs. initial values of ' T — £3 (right) for Palm data. 
The estimate for d\ seems stable at ~ 1.75. di is approximately 2 minutes. 
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Simulated Data 

Fig. 10. Q-Q plot of Palm bid times vs. simulated data from a BARISTA process with 
parameters cti = 4.9, a 2 = 0.37, a 3 = 1.13, di = 1.7, d 2 = 2/10080. 

Finally, to further validate this estimated model, we simulated data from a 
BARISTA process with the above ML estimates as parameters, and number 
of bids equal to that in the respective dataset. Figure 10 shows a QQ-plot of 
the Palm data vs. the simulated data. The points appear to fall on the line 
x = y, thus, supporting the adequacy of the estimated model for the Palm 
bid times. 

The estimated model for the Palm data reveals the dynamics of these 
auctions over time: Indeed, the "average" auction has three stages: the initial 
stage takes place during the first 1.7 days, the middle stage continues until 
the last 2 minutes, and then the third stage kicks in. The bid arrivals in each 
of the three stages have different intensity functions. The auction beginning 
is characterized by an early surge of interest, with more intense bidding 
than during the start of the second stage. Then, the increase in bid arrival 
rate slows down during the middle of the auction. The bids do tend to 
arrive faster as the auction progresses, but at the very end, during the last 
2 minutes of the auction, we observe a uniform bid arrival process. Finally, 
it is interesting to note that in these data the third stage of bidding seems 

Table 3 

Estimates for five BARISTA model parameters using the three estimation methods 

di &2 &3 d\ d-2 (minutes) 



CDF-based Q&C 4.3 (0.02) 0.36 (0.001) 1 (0.02) 1.8 (0.009) 3.3 (0.18) 
Exhaustive search 4.9 0.37 1.13 1.7 2.0 

Genetic algorithm 5.55 (0.005) 0.35 (0.005) 1.1 (0.01) 1.55 (0.005) 2.0 (0.10) 
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to take place within the last 2-3 minutes compared to the last 1 minute in 
Roth and Ockenfels (2000). Thus, we use the term "last- moment bidding" 
rather than "last-minute bidding." 

5.4. Comparing auctions. Table 4 gives the estimated BARISTA coeffi- 
cients for all nine datasets. In all cases the fit was obtained by the process 
described above, using a GA-based model selection procedure and refining 
the estimates using an exhaustive search over the likelihood. 

We can see that except for one dataset (5-day Xbox auctions), a three- 
stage (BARISTA) model provided the best fit among the one-, two- and 
three-stage models. Also, in all cases the final fit of the model to the data, 
evaluated by examining QQ-plots, was very good. The last phase lasted a 
couple of minutes or less in all auctions, irrespective of the auction dura- 
tion, and there was more consistency within a certain item than within a 
certain duration. The first stage tends to last 1-2 days, with one excep- 
tion being 4 days (7-day Cartier auctions). Another common feature is the 
magnitude of «2, around 0.3, indicating that during the second phase the 
bidding frequency is equivalently low in different items' auctions and in dif- 
ferent durations. The bidding intensity during the last short stage, «3, is 
also typically around 1, with two datasets reaching nearly 3. 

The parameter that varies most across datasets is a±, the "early bidding" 
frequency. It appears that the 3-day auctions exhibit a lower level of early 
bidding compared to the longer duration auctions. However, there still does 
exist such a stage even in these "short" auctions. 

Finally, from a computational complexity point, the GA with model- 
selection ran reasonably fast, with the longest estimation taking 11 minutes. 
The runtimes are summarized in Table 5. 



Table 4 

Estimated BARISTA coefficients for all 9 datasets, using a GA-based model selection and 
an exhaustive search over the likelihood function of the selected set of models 
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7-days 
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5-days 
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0.32 


0.79 


1.8 
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3-days 
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1.8 
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1.01 


1.45 


1.4 




7-days 


3 
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0.29 


2.95 


4 


0.5 


Cartier 


5-days 


3 


3.1 
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2.4 


0.51 




3-days 
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Table 5 

Computational complexity of the GA: The table lists the 
run-time of the GA (recorded in minutes) for the 9 data sets 
described in Section 5 



Data 


Palm 


XBox 


Cartier 


3-day 


3.27 


1.35 


0.45 


5-day 


2.19 


1.09 


1.03 


7-day 


11.27 


5.34 


3.54 



6. Relating bidder arrivals and bid arrivals. The online auction litera- 
ture is rich with papers that assume an ordinary homogenous Poisson bid- 
der arrival process. This assumption underlies various theoretical deriva- 
tions, is the basis for the simulation of bid data, and is used to design 
field experiments. Bajari and Hortacsu (2000) specify and estimate a struc- 
tural econometric model of bidding on eBay, assuming a Poisson bidder 
arrival process. Etzion et al. (2003) suggest a model for segmenting con- 
sumers at dual channel online merchants. Based on the assumption of Pois- 
son arrivals to the website, they model consumer choice of channel, simu- 
late consumer arrivals and actions, and compute relationships between auc- 
tion duration, lot size and the constant Poisson arrival rate A. Zhang et al. 

(2002) model the demand curve for consumer products in online auctions 
based on Poisson bidder arrivals, and fit the model to bid data. Pinker et al. 

(2003) and Vakrat and Seidmann (2000) use a Poisson process for model- 
ing the arrival of bidders in going-going-gone auctions. They use the inten- 
sity function \{t) = X a e~ t ^ T , < t < T, where T is the auction duration, 
and A a is the intensity of website traffic into the auction. This model de- 
scribes the decline in the number of new bidders as the auction progresses. 
Haubl and Popkowski Leszczyc (2003) design and carry out an experiment 
for studying the effect of fixed-price charges (e.g., shipping costs) and reserve 
prices on consumer's product valuation. The experiment uses simulated data 
that are based on Poisson arrivals of bidders. These studies are among the 
many that rely on a Poisson arrival process assumption. 

In online auctions, however, bidder arrivals are unobserved as we pointed 
out earlier. Therefore, it is not straightforward to study their distribution. 
On the other hand, bid arrivals are observed. In the following we investigate 
the relationship between bidder arrivals and bid arrivals more carefully. 

6.1. Poisson bidder arrivals yield NHPP bid arrivals. We now establish a 
key connection between the bidder arrival and bid arrival processes. Suppose 
that bidders enter an auction in accordance with a Poisson process having a 
fixed rate A, and that a bidder who arrives at time s places a single bid on 
the interval [s,T) according to a bid time distribution G s . The resulting bid 
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arrival process is similar to the output process of an M/G/oo queue, except 
that the service time (the elapsed time between a bidder's arrival and the 
placement of his bid) is dependent on the arrival time of the bidder. 

By Proposition 2.3.2 of Ross (1995), the bid counts on nonoverlapping 
subintervals of [0,T], are independent variables, and hence the bid arrival 
process possesses independent increments. Moreover, the bid count on [0, y] 
is Poisson distributed with mean 
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dt (if G s has a derivative g s ), 



and hence, if G s has derivative g s , then the bid arrival process is a nonho- 
mogeneous Poisson process with intensity function A(i) = A J^gs^ds. The 
function g s is the link between the bidder arrival process and the result- 
ing bid arrival process. Suppose, for example, that a bidder who arrives at 
time s places a single bid uniformly on the remaining interval (s,T). Then, 
9s (t) = 1/(T - s) so that X(t) = Alog(T/(T - t)). Thus, fixed rate Poisson 
(uniform) bidder arrivals, in conjunction with uniform placement of their 
single bids, yields a nonhomogeneous Poisson process of bid arrivals with an 
intensity that increases as the auction deadline approaches. 



6.2. Poisson bidder arrivals yield BARISTA bid arrivals. Continuing the 
discussion in Section 6.1, we describe a naturally arising bid time distribution 
that yields a BARISTA bid arrival process. Suppose that a bidder who 
arrives at time s\ places his bid immediately (at time s{) with probability 
a > 0, and makes no further bids, or otherwise selects S2 ~ U(s\,T) as his 
next potential bid time. At time S2 he again places his bid with probability 
a, and makes no further bids, or otherwise selects S3 ~ U(s2,T) as his next 
potential bid time. Continuing in this manner, the bidder eventually places 
his single bid and then departs the auction. By the discussion in Section 6.1, 
the resulting bid arrival process is a nonhomogeneous Poisson process. 

To derive the form of the intensity function, let r denote the bid time of 
a randomly chosen bidder. The intensity function A is a constant multiple 
of 4tP{t < t). By Lemma 1 of Appendix C (set a = and b = T), we have 



\{t) 



(21) 



v d „, , . d ( ( t 

(const)— P(t <t) = (const) — < 1-1 

v J dt K ~ J y J dt\ V T 



(const) I 1 



a-l 



0<t<T. 
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This is the intensity function of our NHPPi. 

To obtain NHPP2 bid arrivals with 02 = a and 03 = 1, we alter the above 
set-up so that a bidder who selects a potential bid time after T — d either 
places a bid at that time (with probability a) or otherwise leaves the auction 
without placing a bid (or the one bid he attempts to place on [T — d, T] fails 
to transmit). Note that this variation produces uniformly distributed bid 
arrivals on (T — d,T]. 

To generalize further to NHPP2 bid arrivals with < 02 < «3 < 1, we 
replace a by 02 on [0, T — d], and by on (T — d,T]. We also assign 
probability 1 — 02/0(3 to the event that a bidder who has not placed his bid 
by time T — d ultimately departs the auction without bidding. Let r denote 
the bid time of a bidder chosen randomly from those who successfully place 
a bid. By multiple applications of Lemma 1 of Appendix C, 



r i - (l - ^] * \ 



P(r<t) 



7T I V T 

1 a 2 ( d\ a2 (T -t\ 
1 1 = I — r- , for T - d < t < T, 



7T 03 \T / \ d 

where ir denotes the probability that a randomly chosen bidder successfully 

«3 - 



places a bid [tt is equal to 1 — (^)" 2 (1 — f 2 )]- We have for some c> 



X(t) = (const) ^P(t < t) 
t x Q2_1 



\~t) ' forO<£<T-d, 

c — 1 — — , for T — a < t < T. 
\TJ \ TJ 

This is the intensity function of our NHPP2. 

7. Discussion. Empirical research on bid timing in online auctions has 
been exploratory and data-driven. The BARISTA model is the first pro- 
posed probabilistic model for the bid arrival process in online auctions. This 
probabilistic foundation provides an improved platform for quantifying bid 
arrival processes and for simulating data, and is a first step for establishing 
models of bidder strategies (as shown in Section 6). It can also be used to 
improve nonparametric representations of price-processes in online auctions 
[e.g., used for clustering auctions in Jank and Shmueli (2007) or for fore- 
casting ongoing auctions in Wang et al. (2007)], by specifying the amount 
and locations of knots in the smoothing splines. 

One possible extension of the current BARISTA formulation is to auctions 
of random duration. An alternative to the popular fixed-length format (as on 
eBay) is a format whereby the closing time is extended, beyond the scheduled 
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deadline if necessary, until a predetermined number of minutes have passed 
without a bid being submitted. This is known as a popcorn ending (as on 
Amazon.com). Extending the BARISTA model in this direction is likely to 
require an additional fourth phase and a modification to the bid intensity 
in the third phase, as the parameter T has a different role in the alternative 
format. Formulating such a model and fitting it to real data is an interesting 
future direction. An enhanced BARISTA that models the bivariate process 
of bid arrivals and bid amounts is also of interest, as is the probabilistic 
formulation of bidder arrivals and strategies that lead to BARISTA bid 
arrivals. 



APPENDIX A: ML ESTIMATION OF THE UNCONDITIONAL 

BARISTA MODEL 

Let N(s),0 < s < T, be a NHPP with an intensity function of the form 

\(s) = cg(9,s), 0<s<T, 

where c and 9 = (9\, ... ,9k) are unknown parameters. Define h(9) = 
J g(9,s)ds, so that m(T) = ch{9). The pdf associated with A is f(9,s) = 
X(s)/m(T), < s < T. Given a random sample x±, . . . ,x n (nonrandom n) 
from this distribution, the likelihood and log-likelihood functions of 9 are 

n 

L(9) = Y[f(9, Xi ) and £(0)=lnL(0). 

On the other hand, given the value ti of N (T^ , and the arrival times x\ , . . . , x n 
from the NHPP, the likelihood function of (c, 9) is given by 

L {c,9) = : \\f{9,Xi) = ■ L(9). 

n\ ±J ; n! 

i=i 

The log-likelihood is thus 

C(c,9) = -ch(9) + nlnc + nlnh(9) -Inn! + C{9). 
The joint MLE of c and 9 is the solution of the equations 

(A.l) dC 

n _ dC(c,9) dh(9) n dh{9) dC{9) 

89 j 89 j + h{9) 89 j 89 j 1 

Solving the first equation in (A.l) for c and plugging into the second, we 
find that 

8C(c,9) _8C(9) 
~^9~-~89~' 1<J<^- 
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Hence, L(c,9) and L(9) yield the same MLE for 9. That is, if 9j = 
Wj{X\, . . . ,X n ) is the MLE of 6j (1 < j < k) based on a random sample 
of nonrandom size n from the distribution with the pdf above, then the 
MLE of 6j based on the arrival times X±, 
is of the form 9j = Wj (X\ , 



MLE of c is 
(A.2) 



Xn(T) from the above NHPP 
,Xjv(t))- By the first equation in (A.l), the 

^ N(T) 



h(9) 



APPENDIX B: SECOND DERIVATIVES OF THE LOG-LIKELIHOOD 

FUNCTION 

The second derivatives are given for using gradient methods of ML esti- 
mation such as Newton Raphson: 
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APPENDIX C: A GEOMETRIC SERIES OF SHRINKING UNIFORM 

VARIABLES 

Lemma 1. Suppose X\ ~ U(a, b), X 2 ~ U(Xi,b), X 3 ~ U(X 2 , b),... anc 
M ~ geomia). Then, 



P{X M > s) = 1 



s — a 
b — a 



a< s <b. 



Proof. For convenience, assume a = and 6=1. Define p(s) = P{X\j > 
s). For small x, 



p(s + x)= p(s)P{X M > s + x\X M > s) 



>p(s) 



+ 



l-s) \l-s 



(1-a) 1 



l-s 



and thus, 

(C.l) 

Moreover, 

so that 
(C.2) 



x— »0 X — 1 — s 



x(l — a) 



l-s/ l-s 



+ 



p(s + x)—p(s) ^ —apis) 
limsup < , 

x^O X 1-S 



By (C.l) and (C.2), p'(s) = —ap(s)(l — s) , from which we conclude that 

p(s) = (l-s) a . □ 
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